qualification test
- Oceania > New Zealand (0.04)
- Oceania > Australia (0.04)
Appendix for QVH IGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
In Table 2, we show the effect of using different #moment queries. As can be seen from the table, this hyper-parameter has a large impact on moment retrieval task where a reasonably smaller value (e.g., 10) gives better performance. As described in main text Equation 3, Moment-DETR's saliency loss Table 3, we study the effect of using the two terms. We show more correct predictions and failure cases from our Moment-DETR model in Figure 1 and Figure 2. In Table 4, we show the distribution of annotated saliency scores. We noticed 94.41% of the annotated clips are rated by two or more users as'Fair' or better (i.e., >=3, To ensure data quality, we require workers to pass our qualification test before participating in our annotation task.
Crowdsourcing Annotations for Visual Object Detection
Su, Hao (Stanford University) | Deng, Jia (Stanford University) | Fei-Fei, Li (Stanford University)
A large number of images with ground truth object bounding boxes are critical for learning object detectors, which is a fundamental task in compute vision. In this paper, we study strategies to crowd-source bounding box annotations. The core challenge of building such a system is to effectively control the data quality with minimal cost. Our key observation is that drawing a bounding box is significantly more difficult and time consuming than giving answers to multiple choice questions. Thus quality control through additional verification tasks is more cost effective than consensus based algorithms. In particular, we present a system that consists of three simple sub-tasks --- a drawing task, a quality verification task and a coverage verification task. Experimental results demonstrate that our system is scalable, accurate, and cost-effective.